AITopics | target preference

Collaborating Authors

target preference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

fdb11be1acf5e3724737dd585e590146-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 19:51:43 GMT

demonstration, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Robust Preference Alignment via Directional Neighborhood Consensus

Mao, Ruochen, Shi, Yuling, Gu, Xiaodong, Wei, Jiaheng

arXiv.org Artificial IntelligenceOct-27-2025

Aligning large language models with human preferences is critical for creating reliable and controllable AI systems. A human preference can be visualized as a high-dimensional vector where different directions represent trade-offs between desired attributes (e.g., helpfulness vs. verbosity). Yet, because the training data often reflects dominant, average preferences, LLMs tend to perform well on common requests but fall short in specific, individual needs. This mismatch creates a preference coverage gap. Existing methods often address this through costly retraining, which may not be generalized to the full spectrum of diverse preferences. This brittleness means that when a user's request reflects a nuanced preference deviating from the training data's central tendency, model performance can degrade unpredictably. To address this challenge, we introduce Robust Preference Selection (RPS), a post-hoc, training-free method by leveraging directional neighborhood consensus. Instead of forcing a model to generate a response from a single, highly specific preference, RPS samples multiple responses from a local neighborhood of related preferences to create a superior candidate pool. It then selects the response that best aligns with the user's original intent. We provide a theoretical framework showing our neighborhood generation strategy is provably superior to a strong baseline that also samples multiple candidates. Comprehensive experiments across three distinct alignment paradigms (DPA, DPO, and SFT) demonstrate that RPS consistently improves robustness against this baseline, achieving win rates of up to 69% on challenging preferences from under-represented regions of the space without any model retraining. Our work presents a practical, theoretically-grounded solution for enhancing the reliability of preference-aligned models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.20498

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 22:29:30 GMT

In the standard reinforcement learning (RL) setting, the primary goal is to obtain a policy that maximizes a cumulative scalar reward [Sutton and Barto, 2018].

dataset, demonstration, target preference, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time

Lan, Yifan, Cao, Yuanpu, Zhang, Weitong, Lin, Lu, Chen, Jinghui

arXiv.org Artificial IntelligenceSep-17-2025

Recently, Multimodal Large Language Models (MLLMs) have gained significant attention across various domains. However, their widespread adoption has also raised serious safety concerns. In this paper, we uncover a new safety risk of MLLMs: the output preference of MLLMs can be arbitrarily manipulated by carefully optimized images. Such attacks often generate contextually relevant yet biased responses that are neither overtly harmful nor unethical, making them difficult to detect. Specifically, we introduce a novel method, Preference Hijacking (Phi), for manipulating the MLLM response preferences using a preference hijacked image. Our method works at inference time and requires no model modifications. Additionally, we introduce a universal hijacking perturbation -- a transferable component that can be embedded into different images to hijack MLLM responses toward any attacker-specified preferences. Experimental results across various tasks demonstrate the effectiveness of our approach. The code for Phi is accessible at https://github.com/Yifan-Lan/Phi.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.12521

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety > Terrorism (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

Lin, Qian, Liu, Zongkai, Mo, Danying, Yu, Chao

arXiv.org Artificial IntelligenceSep-15-2024

In recent years, significant progress has been made in multi-objective reinforcement learning (RL) research, which aims to balance multiple objectives by incorporating preferences for each objective. In most existing studies, specific preferences must be provided during deployment to indicate the desired policies explicitly. However, designing these preferences depends heavily on human prior knowledge, which is typically obtained through extensive observation of high-performing demonstrations with expected behaviors. In this work, we propose a simple yet effective offline adaptation framework for multi-objective RL problems without assuming handcrafted target preferences, but only given several demonstrations to implicitly indicate the preferences of expected policies. Additionally, we demonstrate that our framework can naturally be extended to meet constraints on safety-critical objectives by utilizing safe demonstrations, even when the safety thresholds are unknown. Empirical results on offline multi-objective and safe tasks demonstrate the capability of our framework to infer policies that align with real preferences while meeting the constraints implied by the provided demonstrations.

demonstration, safety threshold, target preference, (13 more...)

arXiv.org Artificial Intelligence

2409.09958

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.81)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy-regularized Offline Multi-objective Reinforcement Learning

Lin, Qian, Yu, Chao, Liu, Zongkai, Wu, Zifan

arXiv.org Artificial IntelligenceJan-4-2024

In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.

behavior policy, dataset, regularization weight, (15 more...)

arXiv.org Artificial Intelligence

2401.02244

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Predicting A Creator's Preferences In, and From, Interactive Generative Art

Parikh, Devi

arXiv.org Artificial IntelligenceMar-2-2020

As a lay user creates an art piece using an interactive generative art tool, what, if anything, do the choices they make tell us about them and their preferences? These preferences could be in the specific generative art form (e.g., color palettes, density of the piece, thickness or curvatures of any lines in the piece); predicting them could lead to a smarter interactive tool. Or they could be preferences in other walks of life (e.g., music, fashion, food, interior design, paintings) or attributes of the person (e.g., personality type, gender, artistic inclinations); predicting them could lead to improved personalized recommendations for products or experiences. To study this research question, we collect preferences from 311 subjects, both in a specific generative art form and in other walks of life. We analyze the preferences and train machine learning models to predict a subset of preferences from the remaining. We find that preferences in the generative art form we studied cannot predict preferences in other walks of life better than chance (and vice versa). However, preferences within the generative art form are reliably predictive of each other.

accuracy, generative art form, target preference, (13 more...)

arXiv.org Artificial Intelligence

2003.01274

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)

Add feedback